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Abstract 



Gap Hamming Distance is a well-studied problem in communication complexity, in which Alice 
and Bob have to decide whether the Hamming distance between their respective n-bit inputs is less than 
^ ■ n/2 — or greater than n/2 + ^/n. We show that every fc-round bounded-error communication protocol 

for this problem sends a message of at least 51(n/ (fc^ log k)) bits. This lower bound has an exponentially 
better dependence on the number of rounds than the previous best bound, due to Brody and Chakrabarti. 
I Our communication lower bound implies strong space lower bounds on algorithms for a number of data 

stream computations, such as approximating the number of distinct elements in a stream. 

Subsequent to this result, the bound has been improved by some of us to the optimal ft{n), indepen- 
■ dent of k, by using different techniques. 

u; 

Q ; 1 Introduction 

1.1 The communication complexity of the Gap-Hamming problem 

> 

• Communication complexity studies the communication requirements of distributed computing. In its sim- 

. plest and best-studied setting, two players, Alice and Bob, receive inputs x and y, respectively, and are 

^ I required to compute some function f{x,y). Clearly, for most functions /, the two players need to com- 

municate to solve this problem. The basic question of communication complexity is the minimal amount 
of communication needed. By abstracting away from the resources of local computation time and space, 
^ \ communication complexity gives us a bare-bones but elegant model of distributed computing. It is useful 

and interesting for its own sake, but also one of our main sources of lower bounds in many other models of 
computation, such as data structures, circuit size and depth, Turing machines, VLSI, and algorithms for data 
^ ! streams. The basic results are excellently covered in the book of Kushilevitz and Nisan IIKN97I . but many 

5^ \ more fundamental results have appeared since its publication in 1997. 

One of the few basic problems whose randomized communication complexity is not yet well-understood, 
is the Gap Hamming Distance (GHD) problem, defined as follows. 



*Department of Computer Science, Dartmouth College, Hanover, NH 03755. Supported in part by NSF Grant CCF-0448277. 
Part of this work was done while the author was visiting CWI and Tel Aviv University. 

^Department of Computer Science, Dartmouth College, Hanover, NH 03755. Supported in part by NSF Grants CCF-0448277 
and IIS-09 16565 and a McLane Family Fellowship. 

^Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel. Supported by the Israel Science Founda- 
tion, by the European Commission under the Integrated Project QAP funded by the 1ST directorate as Contract Number 015848, 
by the Wolfson Family Charitable Trust, and by a European Research Council (ERC) Starting Grant. 

§UC Berkeley, vidick@eecs.berkeley.edu. Supported by ARO Grant W91 lNF-09- 1-0440 and NSF Grant CCF-0905626. Part 
of this work was done while the author was visiting CWI and Tel Aviv University. 

^CWI Amsterdam, rdewolf@cwi.nl. Supported by a Vidi grant from the Netherlands Organization for Scientific Research 
(NWO). 



1 



GHD: Alice receives input x G {0, 1}" and Bob receives input y G {0, 1}", with the promise 
that I A(x, y) — n/2\ > yjn, where A denotes Hamming distance. Decide whether A(3;, y) < 
n/2 or A{x, y) > n/2. 

Mind the gap between n/2 — ^/n and n/2 + ^/n, which is what makes this problem interesting and useful. 
We will be concerned with the communication complexity of randomized protocols that solve GHD. A gap 
size of @{^/n) is the natural choice - it is where a 6(1) fraction of the inputs lie inside the promise area, 
and as we'll see below, it is precisely this choice of gap size that has strong implications for streaming 
algorithms lower bounds. Moreover, understanding the complexity of the y^-gap version can be shown to 
imply a complete understanding of the GHD problem for all gaps. The communication complexity of the 
gapless version, where there is no promise on the inputs, can easily be seen to be linear (for instance by 
a reduction from disjointness). The gap makes the problem easier, and the question is how it affects the 
communication complexity: does it remain linear? 

Protocols for GHD and more general problems can be obtained by sampling. Suppose for instance that 
either A(x, y) < (1/2 — 7)n or A(x, y) > (1/2 + 7)n. Choosing an index i S [n] at random, the predicate 
[xi 7^ yi] is a coin flip with heads probability < 1/2 — 7 in the first case and > 1/2 + 7 in the second. It 
is known that flipping such a coin 0(1/7^) times suffices to distinguish these two cases with probability at 
least 2/3. If we use shared randomness to choose 0(1/7^) indices, we obtain a one-round bounded-error 
protocol with communication 0(1/7^) bits. In particular, for GHD (where 7 = l/^/n), the communication 
is 0(n) bits, which is no better than the trivial upper bound of n when Alice just sends x to Bob. 

What about lower bounds? Indyk and Woodruff IIIW03I managed to prove a linear lower bound for the 
case of one-round protocols for GHD, where there is only one message from Alice to Bob (see also f Woo041 
QXS08|). However, going beyond one-round bounds turned out to be quite a difficult problem. Recently, 
Brody and Chakrabarti IIBC09II obtained linear lower bounds for all constant-round protocols: 

Theorem 1. HBC09\I Every k-round bounded-error protocol for GHD communicates at least ^oik'^) 

In fact their bound is significant as long as the number of rounds is A; < co\/Iogn, for a universal 
constant cq. Regarding lower bounds that hold irrespective of the number of rounds, an easy reduction 
gives an Q{^/n) lower bound (which is folklore): take an instance of the gapless version of the problem 
on x,y G {0, 1}^ and "repeat" x and y ^/n times each. This blows up the gap from 1 to ^/n, giving an 
instance of GHD on n bits. Solving this n-bit instance of GHD solves the y^-bit instance of the gapless 
problem. Since we have a linear lower bound for the latter, we obtain a general ^}{^/n) bound for GHdQ 

1.2 Our results 

Our main result is an improvement of the bound of Brody and Chakrabarti, with an exponentially better 
dependence on the number of rounds: 



'in fact the same proof lower-bounds the quantum communication complexity; a linear quantum lower bound for the gapless 
version follows easily from Razborov's work |Raz02| and the observation that A(a;, y) = |a;| + \y\ — 2\x A y\. However, as Brody 
and Chakrabarti observed, in the quantum case this ^/n lower bound is essentially tight: there is a bounded-error quantum protocol 
that communicates 0(v'n log n) qubits. This also implies that lower bound techniques which apply to quantum protocols, such as 
discrepancy, factorization norms |LS07||LS08I , and the pattern matrix method ISheOSI , cannot prove better bounds for classical 
protocols. 



Theorem 2. Every k-round bounded-error protocol for GHD sends a message of length $7 
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In fact we get a bound for the more general problem of distinguishing distance A(x, y) < (1/2 — 7)n 
from A{x,y) > (1/2 + 7)71, as long as 7 = ^}{l/^/n)^. for this problem every fc-round protocol sends a 



Like the result of |BC091, our lower bound deteriorates with the number of rounds. Also like their result, 
our proof is based on round elimination, an important framework for proving communication lower bounds. 
Our proof contains an important insight into this framework that we now explain. 

A communication problem usually involves a number of parameters, such as the input size, an error 
bound, and in our case the gap size. The round elimination framework consists of showing that a A; -round 
protocol solving a communication problem for a class C of parameters can be turned into a (A; — l)-round 
protocol for an easier class C , provided the message communicated in the first round is short. This fact 
is then applied repeatedly to obtain a 0-round protocol (say), for some nontrivial class of instances. The 
resulting contradiction can then be recast as a communication lower bound. Historically, the easier class C 
has contained smaller input size^ than those in C. 

In contrast to previous applications of round elimination, we manage to avoid shrinking the input length: 
the simplification will instead come from a slight deterioration in the error parameter. Here is how this 
works. If Alice's first message is short, then there is a specific message and a large set A of inputs on which 
Alice would have sent that message. Roughly speaking, we can use the largeness of A to show that almost 
any input x for Alice is close to A in Hamming distance. Therefore, Alice can "move" x to its nearest 
neighbor, x, in A: this make her first message redundant, as it is constant for all inputs x & A. Since x and 
X have small Hamming distance, it is likely that both pairs (x, y) and (x, y) are on the same side of the gap, 
i.e. have the same GHD value. Hence the correctness of the new protocol, which is one round shorter, is 
only mildly affected by the move. Eliminating all k rounds in this manner, while carefully keeping track of 
the accumulating errors, yields a lower bound of VL{n/ (/c^ log^ k)) on the maximum message length of any 
fc-round bounded-error protocol for GHD. 

Notice that this lower bound is slightly weaker than the above-stated bound of Q.{n/{k'^\ogk)). To 
obtain the stronger bound, we leave the purely combinatorial setting and analyze a version of GHD on the 
sphere^ Alice's input is a unit vector x E M" and Bob's input is a unit vector y E R", with the promise 
that either x ■ y > l/y^ or x ■ y < —Xj^fn (as we show below in Section [2l this version and the Boolean 
one aie essentially equivalent in terms of communication complexity). Alice's input is now close to the 
large, constant-message set A in Euclidean distance. The rest of the proof is as outlined above, but the final 
bound is stronger than in the combinatorial proof for reasons that are discussed in Section |2!2l Although this 
proof uses arguments from high-dimensional geometry, such as measure concentration, it arguably remains 
conceptually simpler than the one in IIBC09I1 . 

Related work. The round elimination technique was formally identified and named in Miltersen et al. IIMNSW98II 
and dates back even further, at least to Ajtai's lower bound for predecessor data structures fAjtSSl. For us, 
the most relevant previous use of this technique is in the result by Brody and Chakrabarti [|BC09] . where a 
weaker lower-bound is proved on GHD. 

Their proof, as ours, identifies a large subset A of inputs on which Ahce sends the same message. The 
"largeness" of A is used to identify a suitable subset of (n/3) coordinates such that Alice can "lift" any 

^In fact, the classes C and C' are often designed in sucii a way that an instance in C is a "direct sum" of several independent 
instances in C' 

^The idea of going to the sphere was also used by Jayram et al. IJKS08I for a simplified one-round lower bound. As we will 
see in Section|2l doing so is perhaps even more natural than working with the combinatorial version; in particular it is then easy to 
make GHD into a dimension-independent problem. 
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(n/3)-bit input x, defined on these coordinates, to some n-bit input x & A. \n the resulting protocol for 
(n/3)-bit inputs, the first message is now constant, hence redundant, and can be eliminated. 

The input size thus shrinks from n to n/3 in one round elimination step. As a result of this constant- 
factor shrinkage, the Brody-Chakrabarti final lower bound necessarily decays exponentially with the number 
of rounds. Our proof crucially avoids this shrinkage of input size by instead considering the geometry of the 
set A, and exploiting the natural invariance of the GHD predicate to small perturbations of the inputs. 

Remark. This round elimination result was obtained in July 2009. Soon after, in August 2009, the bound 
was actually improved by some us of to the optimal J7(n) independent of the number of rounds, see IICR09II . 
However, the techniques used are completely different, and as such we feel our result and its proof are of 
independent interest. 

1.3 Applications to streaming 

The study of gapped versions of the Hamming distance problem by Indyk and Woodruff IIIW03II was mo- 
tivated by the streaming model of computation, in particular the problem of approximating the number of 
distinct elements in a data stream. For many data stream problems, including the distinct elements problem, 
the goal is to output a multiplicative approximation of some real-valued quantity. Usually, both random- 
ization and approximation are required. When both are allowed, there are often remarkably space-efficient 
solutions. 

As Indyk and Woodruff showed, communication lower bounds for gapped versions of the Hamming 
distance problem imply space lower bounds on algorithms that output the number of distinct elements in 
a data stream up to a multiplicative approximation factor 1 it 7. The reduction from the gapped version 
of Hamming distance works as follows. Alice converts her n-bit string x = X1X2 • • • Xn into a stream of 
tuples a = ((1,2:1), (2,2:2), • • • , {n,Xn)). Bob converts y into r = ((l,yi), (2,y2), • • • , {n,yn)) in a similar 
fashion. Using a streaming algorithm for the distinct elements problem, Alice processes a and sends the 
memory contents to Bob, who then processes r starting from where Alice left off. In this way, they estimate 
the number of distinct elements in cr o r. Note that each element in a is unique, and that elements in r are 
distinct from elements in a precisely when Xj / yi. Hence, an accurate approximation (7 = J7(l/y^) 
is required) for the number of distinct elements in o" o r gives an answer to GHD. This reduction can be 
extended to multi-pass streaming algorithms in a natural way: when Bob is finished processing r, he sends 
the memory contents back to Alice, who begins processing a a second time. Generalizing, it is easy to see 
that a p-pass streaming algorithm gives a {2p — l)-round communication protocol, where each message is 
the memory contents of the streaming algorithm. Accordingly, a lower bound on the length of the largest 
message of (2p — l)-round protocols gives a space lower bound for the p-pass streaming algorithm. 

Thus, the one-round linear lower bound by Indyk and Woodruff [IWOS] yields the desired 
(one-pass) space lower bound for the streaming problem. Similarly, our new communication lower bounds 
imply ri(l/(7^p^ \ogp)) space lower bounds for p-pass algorithms for the streaming problem. This bound 
is r2(l/7^~°(^)) for all p = n"^^^ and improves on previous bounds for all p = o{-n}/^ / \/ log n). 

Organization of the paper. We start with some preliminaries in Section [2l including a discussion of 
the key measure concentration results that we will use, both for the sphere and for the Hamming cube, in 
Section l2!2l In Section |3] we prove our main result, while in Section |4] we give the simple combinatorial 
proof of the slightly weaker result mentioned above. 
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2 Preliminaries 



Notation. For x, y G M", let d{x, y) := \\x — y\\ be the Euclidean distance between x and y. For z S M, 
define sgn(z) := if z > 0, and sgn(z) = 1 otherwise. For a set C R", let d{x, S) be the infimum over 
all y G 5 of d{x, y). The unique rotationally-invariant probability distribution on the n-dimensional sphere 
is the Haar measure, which we denote by v. When we say that a vector is taken from the uniform 
distribution over a measurable subset of the sphere, we will always mean that it is distributed according to 
the Haar measure, conditioned on being in that subset. 

Define the max-cost of a communication protocol to be the length of the longest single message sent 
during an execution of the protocol, for a worst-case input. We use R^{f) to denote the minimal max-cost 
amongst all two-party, /c -round, public-coin protocols that compute / with error probability at most e on 
every input (here a "round" is one message). See IIKN97II for precise definitions. 

2.1 Problem definition 

We will prove our lower bounds for the problem QHVd^j, where d is an integer and 7 > 0. In this problem 
Alice receives a d-dimensional unit vector x, and Bob receives a d-dimensional unit vector y, with the 
promise that \x ■ y\ > 7. Alice and Bob should output sgn(x • y). 

We show that QHV^ i/V^ essentially the same randomized communication complexity as the prob- 
lem GHD that we defined in the introduction. Generalizing that definition, for any g > define the problem 
GHD„ g, in which the input is formed of two n-bit strings x and y, with the promise that | A(x, y)—n/2\ > g, 
where A is the Hamming distance. Alice and Bob should output if A(x, y) < n/2 and 1 otherwise. 

The following proposition shows that for any ^/n < g < n, the problems GHD„ g and QTiV^^-y are es- 
sentially equivalent from the point of view of randomized communication complexity (with shared random- 
ness) as long as d > n and 7 = Q{g/n). This proposition also shows that the randomized communication 
complexity of QTiVd^j is independent of the dimension d of the input, as long as it is large enough with 
respect to the gap 7. 

Proposition 3. For every e > 0, there is a constant Co = Co(e) such that for every integers k,d > and 
\fn < g < n, we have 

Proof. We begin with the right inequality. The idea is that a GHD„ ^ protocol can be obtained by applying 
a given QHV protocol to a suitably transformed input. Let x,y £ {0, 1}" be two inputs to GHD„ g. 
Define x = ((— l)^Vv^)j=i n ^'^'^ V — ((~l)^Vv^)t=i n- Then x,y e S"~^. Moreover, x ■ y = 
1 — 2A{x,y)/n. Therefore, if A{x,y) > n/2 + g then x ■ y < —2g/n, and if A{x,y) < n/2 — g then 
x-y> 2g/n. This proves R^{GUDn,g) < R^{GnV^^29/n)- 

For the left inequality, let x and y be two unit vectors (in any dimension) such that \x ■ y\ > 7, where 
7 = Cog/n. Note that since g > ^Jn, we have n = i7(7~^). Using shared randomness, Alice and Bob pick 
a sequence of vectors li^i, . . . , Wn, each independently and uniformly drawn from the unit sphere. Define 
two n-bit strings x = (sgn(x • Wi))^^^ ^ and y = (sgn(y • Wi))^^^ Let a = cos~^(x • y) be the angle 
between x and y. Then a simple argument (used, e.g., by Goemans and Williamson | IGW95i ) shows that the 
probability that a random unit vector w is such that sgn(x • w) ^ sgn(y • w) is exactly ck/tt. This means that 
for each i, the bits Xj and yi differ with probability ^ cos^^ (x • y), independently of the other bits of x and y. 
The first few terms in the Taylor series expansion of cos^^ are 005^^(2;) = f — -2 — ^ + 0{z^). Hence, 
for each i, Piw^ {xi 7^ y^) = 1/2 — 0(x • y), and these events are independent for different i. Choosing Co 
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sufficiently large, with probability at least 1 — e, the Hamming distance between x and y is at most n/2 — g 
if X ■ y > J, and it is at least n/2 + gifx-y< —7. □ 

2.2 Concentration of measure 

It is well known that the Haar measure u on a. high-dimensional sphere is tightly concentrated around the 
equator — around any equator, which makes it a fairly counterintuitive phenomenon. The original phrasing 
of this phenomenon, usually attributed to P. Levy IILev51l . goes by showing that among all subsets of the 
sphere, the one with the smallest "boundary" is the spherical cap = {y £ : x • y > 7}. The 

following standard volume estimate will prove useful (see, e.g., ||Bal97l . Lemma 2.2). 

Fact 4. Let x G S"-i and 7 > 0. Then i^{S^) < 6""^'"/^ 

Given a measurable set A, define its t-boundary At := {x G : d{x, A) < t}, for any t > 0. At the 
core of our results will be the standard fact that, for any not-too-small set A, the set At contains almost all 
the sphere, even for moderately small values of t. 

Fact 5 (Concentration of measure on the sphere). For any measurable A C and any t > 0, 

Vv{x £ A) Pr(x ^ Ai) < 4 e~*'"/^, (1) 
where the probabilities are taken according to the Haar measure on the sphere. 

Proof. The usual measure concentration inequality for the sphere (Theorem 14.1.1 in IIMat02ll ) says that for 
any set B C S"^^ of measure at least 1/2 and any t' > 0, 

This suffices to prove the fact if Pr(j; £ A) > 1/2. Assume for the rest of the proof that Pr(x £ A) < 1/2. 
Let to be such that AtQ has measure 1/2; such a to exists by continuity. Applying measure concentration to 
B = At„ gives 

Pr(:E^^i,+tJ<2e-(*')'"/2, (2) 

for all t' > 0, while applying it to i? = At^^ yields 

Pr(x G Ato-t") < Pr(x Bf) < 2e-(*")'"/2 

for all t" < to, since At^-t" is included in the complement of {AtQ)t". Taking t" = to gives us Pr(x £ A) < 
2 e~*o"/2_ 11 1 < to then this suffices to prove the inequality. Otherwise, set t' := t — to in ^ and t" := to 
in (O and multiply the two inequalities to obtain the required bound, by using that to + (t — to)^ > t^/2 
(which holds since 2tg + tV2 - 2t to = (\/2to - t/V^f > 0). □ 

Why the sphere? In Section |4] we give a proof of a slightly weaker lower bound than the one in our 
main result by using measure concentration facts on the Hamming cube only. We present those useful facts 
now, together with a brief discussion of the differences, in terms of concentration of measure phenomenon, 
between the Haar measure on the sphere and the uniform distribution over the hypercube. These differences 
point to the reasons why the proof of Section|4] gives an inferior bound. 

Similarly to our definition of a spherical cap, we can define a "Hamming cap" Tj!; on the Hamming cube 
as T^^' = {y £ {0, 1}" : A{x, y) < n/2 — Cy/n}. The analogue of Fact|4]is then given by the usual Chernoff 
bound: 
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Fact 6. For all c> 0, we have 2-"|r^^| < e"^'' . 

A result similar to Levy's, attributed to Harper IIHar66ll . states that among all subsets (of the sphere) of 
a given size, the cap is the one with the smallest boundary. Following a similar proof as for Fact|5j one can 
get the following statement for the Hamming cube (see e.g. Corollary 4.4 in HBarOSI ): 

Fact 7 (Concentration of measure on the Hamming cube). Let A C {0, 1}" be any set, and define = 
{x G {0, 1}" :3ye A, A{x,y) < c^}. Then 

Pr(x G A) Pr(x ^ A^) < e"^', (4) 
where the probabilities are taken according to the uniform distribution on the Hamming cube. 

To compare these two statements, embed the Hamming cube in the sphere by mapping x G {0, 1}" to the 
vector Vx = :^((~l)^')ie[n]- Two strings of Hamming distance c^/n are mapped to vectors with Euchdean 

distance \/2c/n^/^, so that inequality (01) is much weaker than inequality ([T]). In particular we see that, while 
on the sphere most points are at distance roughly 1 / y/n from any set of measure half, if we are restricted 
to the Hamming cube then very few points are at a corresponding Hamming distance of 1 from, say, the 
set of all strings with fewer than n/2 Is, which has measure roughly 1/2 in the cube. This difference is 
crucial: it indicates that the n-dimensional cube is too rough an approximation of the n-dimensional sphere 
for our purposes, perhaps explaining why our combinatorial bound in Section |4] yields a somewhat weaker 
dependence on the number of rounds. 

3 Main result 

Our main result is the following. 

Theorem 8. Let < e < 1/50. There exist constants C, C depending only on e such that the following 
holds for any 7 > and any integers n > e^/ (47^) and k < C / (7 ln(l/7)).' if P is a randomized e-error 
k-round communication protocol for QTCDn^^ then some message has length at least • bits. 

Using Proposition [3] we get a lower bound for the Hamming cube version GHD = GHD„ 

Corollary 9. Any e-error k-round randomized protocol for GHD communicates il(n/(A;^ In k)) bits. 

This follows from Theorem [8] when k = o{y/n/ \ogn). If k is larger, then the bound stated in the 
Corollary is in fact weaker than the general lower bound which we sketched in the introduction. 

3.1 Proof outline 

We now turn to the proof of Theorem[8] Let e, 7 and n be as in the statement of the theorem. Since lowering 
n only makes the QTiVn.^ problem easier, for the rest of this section we assume that n := e^/ (47^) is fixed, 
and for simplicity of notation we write QHV-y for QHVn^-y. 
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Measurability. Before proceeding with the proof, we first need to handle a small technicality arising from 
the continuous nature of the input space: namely, that the distributional protocol might make decisions 
based on subsets of the input space that are not measurable. To make sure that this does not happen, set 
5 = 7/4 and consider players Alice and Bob who first round their inputs to the closest vector in a fixed 6- 
net, and then proceed with an e-error protocol for QTl'Dj/2- Since the rounding changes x • y by at most 7/2, 
provided Alice and Bob are given valid inputs to QHVy they will succeed with probability 1 — e. Hence any 
randomized e-error protocol for 070)^/2 can be transformed into a randomized e-error protocol for QHVy 
with the same communication, but which initially rounds its inputs to a discrete set. We prove a lower bound 
on the latter type of protocol. This will ensure that all sets encountered in the proof are measurable. 

Distributional complexity. By Yao's principle it suffices to lower-bound the distributional complexity, 
i.e., to analyze deterministic protocols that are correct with probability 1 — e under some input distribution. 
As our input distribution for GHVy we take the distribution that is uniform over the inputs satisfying the 
promise \x ■ y\ > 7. 

Given our choice of n. Claim [TT] below guarantees that the v x z^-measure of non-promise inputs is at 
most e. Hence it will suffice to lower-bound the distributional complexity of protocols making error at most 
2 e under the distribution u x u. We define an e-protocol to be a deterministic communication protocol for 
QTtVn^'Y whose error under the distribution u xvisat most e, where we say that the protocol makes an error 

if P{x,y) / sgn(x,y). 

We prove a lower bound on the maximum length of a message sent by any e-protocol, via round ehmi- 
nation. The main reduction step is given by the following technical lemma: 

Lemma 10 (Round Ehmination on the sphere). Let < e < 1/25, 7>0, n = e^/(47^), and 1 < 
K < k. Assume there is a K-round e-protocol P such that the first message has length bounded as ci < 
^^Wlnk ~ '''ln(2A;) where Ci is a universal constant. Then there is a {n — \)-round e' -protocol Q (obtained 
by eliminating the first message of P), where 

Before proving this lemma in Section [3^ we show how it implies Theorem [H 

Proof of Theorem^ We will show that in any A; -round (2 e)-protocol, there is a message sent of length at 
least C\nl(}? In /c) — 71n(2fc). The discussion in the "Distributional complexity" paragraph above shows 
this suffices to prove the theorem, by setting C = Cie^/8, and choosing C' small enough so that the bound 
on k in the statement of the theorem implies that 7 ln(2A;) < Cin/ (2A;^ In k). 

Let P be a A; -round (2 e)-protocol, and assume for contradiction that each round of communication 
uses at most Cin/ {k'^lnk) — 71n(2A;) bits. The recurrence e^ = (1 + l/A;)eK-i + 1/(16A;), eo = 2e, 
is easily solved to e^ = (1 + 1/A;)''(2e + 1/16) — 1/16, so that applying Lemma [TOl k times leads to a 
0-round protocol for QHVj that errs with probability at most e' < e (2e + 1/16) — 1/16 < 1/4 over the 
input distribution u x u. We have reached a contradiction: such a protocol needs communication and hence 
cannot be 0-round. Hence P must send a message of length at least Cin/ (A;^ In A;) — 7 ln(2A:), which is what 
needed to be shown. □ 

3.2 The main reduction step 

Proof of Lemma [TOl Let P{x,y) denote the output of the protocol on input x,y. Define x G to 
be 6-good if Fr^xuiPixjy) errs \x) < 6e. By Markov's inequality, at least a (1 — l/5)-fraction of x 
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(distributed according to u) are good. For a given message m, let Am be the set of all good x on which 
Alice sends m as her first message. The sets Am, over all messages m G {0, form a partition of 
the set of good x. Define mi := argmax^i/(Am) and let ^ := Am^- Setting 6 = l + l/fc, we have 
iy{A) > (1 - ^) 2-^1 > e^^i~''^('=+i). 

We now define protocol Q. Alice receives an input x, Bob receives y, both distributed according to u. 
Alice computes the point x £ A that is closest to x, and Bob sets y := y. They run protocol P{x, y) without 
Alice sending the first message, so Bob starts and proceeds as if he received the fixed message mi from 
Ahce. 

To prove the lemma, it suffices to bound the error probability e' of Q with input x, y distributed according 
to V X u. Define di = 2\J '^^^^ in(2fc)+2 ^ consider the following bad events: 

• BADi : d{x,A) > di 

• BAD2 : P{x, y) 7^ sgn(x • y) 

• BAD3 : A) < di but sgn(rc • y) 7^ sgn(x • y). 

If none of those events occurs, then protocol P outputs the correct answer. We bound each of them sepa- 
rately, and will conclude by upper bounding e' with a union bound. 

The first bad event can be easily bounded using the measure concentration inequality from Fact[5l Since 
X is uniformly distributed in S"~^ and Pr(^) > ^-^i-Mk+i) ^ 

PrCBADi) < Ae-^^^/^+ci+l^e^+i) < 4e-5ln(2fc)-2 < _}_ 
^ ^- - - 32A;' 

The second bad event has probability bounded by (1 + e by the goodness of x. Now consider event 
BAD3. Without loss of generality, we may assume that x ■ y = x ■ y > but x ■ y < (the other case is 
treated symmetrically). In order to bound BAD3, we will use the two following claims. The first shows that 
the probability that x ■ y is close to for a random x and y is small. The second uses measure concentration 
to show that, if x • y is not too close to 0, then moving x to the nearby x is unlikely to change the sign of the 
inner product. 

Claim 11. Let x, y be distributed according to u. For any real a > 0, 

Pr(0 < X ■ y < a) < a^fn 



Proof. Letting u;„ be the volume of the n-dimensional Euclidean unit ball, we can write (see e.g., iBGK+9811 . 
Lemma 5.1) 

Pr(0 <x-y<a) = '—^ / (1 - t^)— dt 

nuJn Jo 

< a\fn 



where we used < J ^ < Jn. □ 

Claim 12. Let x, x be two fixed unit vectors at distance \\x — x\\ = d G [0, di], and < a < l/{4:^/n). Let 
y be taken according to v. Then 

Pr(x •y>aA2;-y<0)< e-°'"/(^'='i). 
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Proof. Note that x ■ x = 1 — — 5;|p/2 = 1 — (P /2. Since the statement of the lemma is rotationally- 
invariant, we may assume without loss of generality that 

x = (l,0,0...,0), 

X = (1 - (f/2, - ^J<P - d4/4, 0, . . . , 0), 

y = (yi,y2,y3, • • • ,yn)- 

Therefore, yi> a when x ■ y > a. Note that 



X - y = xiyi + X2y2 > (1 - (f /2)a - cP - d4/4y2- 
Hence the event x-y>af\x-y<{) implies 

[l-(f/2)a 



2/2 > 



a 

> — 
- 2d 

where we used the fact that d < di < 1, given our assumption on ci. By FactHl the probability that, when y 
is sampled from v, y2 is larger than a/ {2d) is at most e^" n/{M ) jjence the probability that both x - y > a 
and X ■ y < happen is at most as much. □ 

Setting a = l/{l28k^/n), by Claim [TT] we find that the probability that < x • y < a is at most 
1/(128A;). Furthermore, the probability that x ■ y > a and x • y < is at most exp (^— 2i'ik^(ci+6in{2k)+2) ) 
by Claim [T2I This bound is less than 1/(128/;;) given our assumption on ci, provided Ci is a small enough 
constant. Putting both bounds together, we see that 

Pr(x •y>OAx-y<0)< l/{6Ak). 

The event that x ■ y < but x • y > is bounded by l/{64k) in a similar manner. Hence, Pr(BAD3) < 
1/(32A;). Taking the union bound over all three bad events concludes the proof of the lemma. □ 



4 A simple combinatorial proof 

In this section we present a combinatorial proof of the following: 

Theorem 13. Let < e < 1/50. There exists a constant C" depending on e only, such that the following 
holds for any g < C"^/Eandk< n^/'^/{1024logn): ifP is a randomized e-error k-round communication 
protocol for GHD.„^g then some message has length at least (5]^2fc)"log^ k 

Even though this is a weaker result than Theorem [8l its proof is simpler and is based on concentration of 
measure in the Hamming cube rather than on the sphere (we refer to Section [Z2l for a high-level comparison 
of the two proofs). Interestingly, the dependence on the number of rounds that we obtain is quadratically 
worse than that of the proof using concentration on the sphere. We do not know if this can be improved 
using the same technique. 

We proceed as in Section [XT] observing that it suffices to lower-bound the distributional complexity of 
GHD„ g under a distribution uniform over the inputs satisfying the promise | A(x, y) — n/2\ > g. In fact. 
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as we did before, by taking C" small enough we can guarantee that the number of non-promise inputs is at 
most e2". Hence it will suffice to lower-bound the distributional complexity of protocols making error at 
most 2 e under the uniform input distribution. We define an e-protocoJ to be a deterministic communication 
protocol for GHD whose distributional error under the uniform distribution is at most e. The following is 
the analogue of Lemma [TOl from which the proof of Theorem [13] follows as in Section ITT] 

Lemma 14 (Round Elimination on the Hamming cube). Let < e < 1/25 and k, k be two integers such 
that k > 128 and 1 < k < k < n^/'^/(10241ogn). Assume that there is a K-round e-protocol P such 
that the first message has length bounded by ci < n/((512A;)^ log^ k). Then there exists a {k — \)-round 
e' -protocol Q (obtained by eliminating the first message of P) where 



Proof. Define x G {0,1}" to be good if Pr(P(x,y) errs \x) < (1 + l/k)e. By Markov's inequality, 
at least a + l)-fraction of x G {0, 1}" are good. For a given message m, let Am '■= {good x : 
Alice sends m given x}. The sets A^, over all messages m e {0, 1}"^^ together form a partition of the 
set of good x. Define rrii := argmax^|^m|, and let A := Am-^^. By the pigeonhole principle, we have 

1^1 > fcTT2"-^^- 

We now define protocol Q. Alice receives an input x, Bob receives y, uniformly distributed. Alice 
computes the string x ^ A that is closest to x in Hamming distance, and Bob sets y := y. They run 
protocol P{x, y) without Alice sending the first message, so Bob starts and proceeds as if he received the 
fixed message rrii from Alice. 

To prove the lemma, it suffices to bound the error probability e' of Q under the uniform distribution. 
Define di = S^y/n/ ((1024fc)^ log A;). As in the proof of Lemma [TOl we consider the following bad events: 

• BADi : A(x,x) > dl^/n 

• BADa : Pix,y) / GHD(x,y) 

• BAD3 : A{x,x) < di^butGHD(x,y) /GHD(a;,y) 

If none of those events occurs, then protocol P outputs the correct answer. We bound each of them sepa- 
rately, and will conclude by a union bound. 

The first bad event is easily bounded using Fact|7l which implies that 



given our assumptions on ci and k. The second bad event is bounded by (1 + 1/k) e, by definition of the 
set A. 

We now turn to BAD3. The event that GHD(x, y) ^ GHD(x, y) only depends on the relative distances 
between x, x, and y, so we may apply a shift to assume that x = (0, . . . , 0). Without loss of generality, we 
assume that A(x, y) > n/2 and \y\ < n/2 (the error bound when A(x, y) < n/2 and \y\ > n/2 is proved 
in a symmetric manner). Note that, since y is uniformly random (subject to \y\ < n/2), with probability 
at least 1 — 1/(128A;), we have \y\ < n/2 — y^/(128A;). Hence we may assume that this holds with an 
additive loss of at most 1/(128A;) in the error. Now 




81n/((1024fc)'*log2 



fc)2Ci+log(fc+l) < _ < _!_ 

- fc2 - 32k 
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It is clear that the worst case in this statement is for |y| = n/2 — ^Jnj (128fc) and |x| = A(j;, x) = dl^/n. 
By symmetry, the probability that this event happens is the same as if we fix any y of the correct weight, 
and i is a random string of weight diy/n. Since the expected intersection size is \x\/2 — (ii/(128fe), by 
Hoeffding's inequality (see e.g. the bound on the tail of the hypergeometric distribution given in IIChv79ll ). 

for a = V^/(256A:) - di/(128/c) 

/ \x\ ~\~ Ti — ft I '2i\ 
Pr ( |x n 2/1 < — — \ = Pr (|x n y| < E[|x n y\] - a) 

< g-2a2/(div/H)_ 

Given our choice of di we have a > "i^fnj (4-256/c), and hence the upper bound is at most 1/A;2 < 1/(128A;), 
given our assumption on k. Applying the union bound over all bad events then yields the lemma. □ 
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